Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion

نویسنده

  • Scott Shaobing Chen
چکیده

In this paper, we are interested in detecting changes in speaker identity, environmental condition and channel condition; we call this the problem of acoustic change detection. The input audio stream can be modeled as a Gaussian process in the cepstral space. We present a maximum likelihood approach to detect turns of a Gaussian process; the decision of a turn is based on the Bayesian Information Criterion (BIC), a model selection criterion well-known in the statistics literature. The BIC criterion can also be applied as a termination criterion in hierarchical methods for clustering of audio segments: two nodes can be merged only if the merging increases the BIC value. Our experiments on the Hub4 1996 and 1997 evaluation data show that our segmentation algorithm can successfully detect acoustic changes; our clustering algorithm can produce clusters with high purity, leading to improvements in accuracy through unsupervised adaptation as much as the ideal clustering by the true speaker identities.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved speaker segmentation and segments clustering using the bayesian information criterion

Detection of speaker, channel and environment changes in a continuous audio stream is important in various applications (e.g., broadcast news, meetings/teleconferences etc.). Standard schemes for segmentation use a classi er and hence do not generalize to unseen speaker / channel / environments. Recently S.Chen introduced new segmentation and clustering algorithms, using the so-called BIC. This...

متن کامل

Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition

This paper addresses the problem of the detection of speaker changes and clustering speakers when no information is available regarding speaker classes or even the total number of classes. We assume that no previous information on speakers is available (no speaker model, no training phase) and that people do not speak simultaneously. The aim is to apply speaker grouping information to speaker a...

متن کامل

Speaker diarization for meeting room audio

This paper describes a speaker diarization system in 2007 NIST Rich Transcription (RT07) Meeting Recognition Evaluation for the task of Multiple Distant Microphone (MDM) in meeting room scenarios. The system includes three major modules: data preparation, initial speaker clustering and cluster purification/merging. The data preparation consists of the raw data Wiener filtering and beamforming, ...

متن کامل

Speaker diarization using normalized cross likelihood ratio

In this paper, we present the Normalized Cross Likelihood Ratio (NCLR) and the advantages of using it in a speaker diarization system. First, the NCLR is used as a dissimilarity measure between two Gaussian speaker models in the speaker change detection step and its contribution to the performance of speaker change detection is compared with those of BIC and Hostelling’s T-Statistic measures. T...

متن کامل

Two step speaker segmentation method using Bayesian information criterion and adapted Gaussian mixtures models

This paper addresses the topic of online unsupervised speaker segmentation in a complex audio environment as it is present in the Broadcast News databases. A new two stage speaker change detection algorithm is proposed, which combines the Bayesian Information Criterion with an ABLS-SCD statistical framework where adapted Gaussian mixture models are used to achieve higher accuracy. To enhance th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998